Machine Learning Bias, Statistical Bias, and Statistical Variance of Decision Tree Algorithms

نویسندگان

  • Thomas G. Dietterich
  • Eun Bae
چکیده

The term \bias" is widely used|and with diierent meanings|in the elds of machine learning and statistics. This paper clariies the uses of this term and shows how to measure and visualize the statistical bias and variance of learning algorithms. Statistical bias and variance can be applied to diagnose problems with machine learning bias, and the paper shows four examples of this. Finally, the paper discusses methods of reducing bias and variance. Methods based on voting can reduce variance, and the paper compares Breiman's bagging method and our own tree randomization method for voting decision trees. Both methods uniformly improve performance on data sets from the Irvine repository. Tree randomization yields perfect performance on the Letter Recognition task. A weighted nearest neighbor algorithm based on the innnite bootstrap is also introduced. In general, decision tree algorithms have moderate-to-high variance, so an important implication of this work is that variance|rather than appropriate or inappropriate machine learning bias|is an important cause of poor performance for decision tree algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

بررسی کارایی مدل درختان تصمیم‌گیری در برآورد رسوبات معلق رودخانه‌ای (مطالعه موردی: حوضه سد ایلام)

The real estimation of the volume of sediments carried by rivers in water projects is very important. In fact, achieving the most important ways to calculate sediment discharge has been considered as the objective of the most research projects. Among these methods, the machine learning methods such as decision trees model (that are based on the principles of learning) can be presented. Decision...

متن کامل

Statistical Sources of Variable Selection Bias in Classification Tree Algorithms Based on the Gini Index

Evidence for variable selection bias in classification tree algorithms based on the Gini Index is reviewed from the literature and embedded into a broader explanatory scheme: Variable selection bias in classification tree algorithms based on the Gini Index can be caused not only by the statistical effect of multiple comparisons, but also by an increasing estimation bias and variance of the spli...

متن کامل

Appendix : Machine Learning Bias Versus Statistical Bias

is if and 0 if. This high variance may help to explain why there is selection pressure for weak (machine learning) bias when the (machine learning) bias correctness is low. The reason that statisticians are interested in (statistical) bias and variance is that squared error is equal to the sum of squared (statistical) bias and variance. Therefore minimal (statistical) bias and minimal variance ...

متن کامل

Appendix : Machine Learning Bias Versus Statistical Bias

is if and 0 if. This high variance may help to explain why there is selection pressure for weak (machine learning) bias when the (machine learning) bias correctness is low. The reason that statisticians are interested in (statistical) bias and variance is that squared error is equal to the sum of squared (statistical) bias and variance. Therefore minimal (statistical) bias and minimal variance ...

متن کامل

Appendix : Machine Learning Bias Versus Statistical Bias

is if and 0 if. This high variance may help to explain why there is selection pressure for weak (machine learning) bias when the (machine learning) bias correctness is low. The reason that statisticians are interested in (statistical) bias and variance is that squared error is equal to the sum of squared (statistical) bias and variance. Therefore minimal (statistical) bias and minimal variance ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995